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Definitions: 

Tiie Value of an Entry in a Row and a Column 

is the Data Entered in that Cell. 

The Dictionary for a Column has an Entry 

for Each Different Value in the Column. 

The Width of a Column is the Number of 

Bits Used to Specify its Entries. 

The Cardinality of a Column Is the Number of 

Different Values in the Rows. 

Given: Table Column with n Rows and Cells k Bits Wide 
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Determine Cardinality m of Column, Where 
m is Number of Different Entries in Column 



i 

Create Dictionary for Column Entries 
Dictionary has m Rows and Width k Bits 
Dictionary Line Numbers haye Width w Bits 
Such that if p=2'^w then m<p 
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Rewrite Column Using Dictionary References 
Reset Column Width to w Bits 
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Check that w is Minimum, Where w = log2 p and p>m 
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Given: Table Columns C1 and C2, Each with n Rows 
C1 is a List of Document IDs dli, fori=1, .... n 
C2 is a List of Document IDs d2i, for i=1 , .... n 
The Document ID of a Column is the Row Number. 
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Define Dictionaries D1 and D2 for C1 and 02 
Dictionaries are Ordered by Value IDs 
(Alphanumeric by Doc Contents). 
The Value ID for a Value in a Dictionary is the Row 
Number of its Entry in the Dictionary. 
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Define Combined Dictionary D12 Listing Pairs [d1i, d2i] of 

Doc IDs from Columns C1 and C2 for Each 1=1 n 

and Ordered by Respective Value IDs 
(Alphanumeric Order for d1 then for d2) 
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Create Combined Column C1 2 Using Dictionary D12 
Line References and Ordered by Doc ID Pairs for 
1=1, .... n (CI 2 Order is Same as C1 and C2) 
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Delete Columns C1 and C2 
All Their Info is Now in D1,D2, D12 
and New Combined Column C12 
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C Memory Footprint of Combined Column C12 
Plus Dictionaries is Generally Much Less Than 
Memory Footprint of Columns C1 and C2 



FIG. 2A 
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